Two emerging hardware trends will dominate the database system technology inthe near future: increasing main memory capacities of several TB per server andmassively parallel multi-core processing. Many algorithmic and controltechniques in current database technology were devised for disk-based systemswhere I/O dominated the performance. In this work we take a new look at thewell-known sort-merge join which, so far, has not been in the focus of researchin scalable massively parallel multi-core data processing as it was deemedinferior to hash joins. We devise a suite of new massively parallel sort-merge(MPSM) join algorithms that are based on partial partition-based sorting.Contrary to classical sort-merge joins, our MPSM algorithms do not rely on ahard to parallelize final merge step to create one complete sort order. Ratherthey work on the independently created runs in parallel. This way our MPSMalgorithms are NUMA-affine as all the sorting is carried out on local memorypartitions. An extensive experimental evaluation on a modern 32-core machinewith one TB of main memory proves the competitive performance of MPSM on largemain memory databases with billions of objects. It scales (almost) linearly inthe number of employed cores and clearly outperforms competing hash joinproposals - in particular it outperforms the "cutting-edge" Vectorwise parallelquery engine by a factor of four.
展开▼